Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Repairing of missing bus arrival data based on DBSCAN algorithm and multi-source data
WANG Cheng, CUI Ziwei, DU Zilin, GAO Yueer
Journal of Computer Applications    2019, 39 (11): 3184-3190.   DOI: 10.11772/j.issn.1001-9081.2019051033
Abstract498)      PDF (1091KB)(291)       Save
In order to solve the problem that the existing repair methods for missing bus arrival information have little factors considered, low accuracy and poor robustness, a method to repair missing bus arrival data based on DBSCAN (Density-Based Spatial Clustering of Applications with Noise) algorithm and multi-source data was proposed. Bus GPS (Global Positioning System) data, IC (Integrated Circuit) card data and other source data were used to repair the missing arrival information. For the name, longitude and latitude data of the missing arrival station, the association analysis of complete arrival data and static line information were carried out to repair. For the missing arrival time data, the following steps were taken to repair. Firstly, for every missing data station and its nearest non-missing data station, the travel time and schedule in the historical complete arrival data between the two stations were clustered based on DBSCAN algorithm. Secondly, whether the two adjacent runs of the studied bus with complete data belonged to the same cluster was judged, and if they belonged to the same cluster, th cluster would not change, otherwise the two clusters would be merged. Finally, the maximum travel time corresponding to the cluster midpoint was used as the missing travel time to determine whether there was a passenger swiping his card to board the bus at this station or not, if so, the arrival time was calculated from the time of swiping cards, and if not, the mean of the maximum and minimum travel time corresponding to the cluster midpoint was used as the missing travel time to calculate the arrival time. Taking Xia'men bus arrival data as examples, in the repair of name, longitude and latitude of the missing arrival station, the clustering method based on GPS data, the maximum probability estimation method and the proposed method can repair the data by 100.00%. In the repair of missing arrival time, the mean relative error of the proposed method is 0.0301% and 0.0004% lower than that of two comparison methods respectively, and the correlation coefficient of the proposed method is 0.005 and 0.0075 higher than that of two comparison methods respectively. The simulation results show that the proposed method can effectively improve the accuracy of repair of missing bus arrival data, and reduce the impact of the number of missing stations on accuracy.
Reference | Related Articles | Metrics